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ABSTRACT 


This thesis presents the evaluation of work distribution 
algorithms and hardware topologies in a multi-Transputer 
network. The primary emphasis concerns a work distribution 
algorithm known as "“workfarm" that is effective on problems 
that are divisible into independent work packets. 

All the programs and examples presented in this thesis 
were implemented in the OCCAM programming language, using 
iris lieememeDevelopmens System, DgO0C, Beta 2.0 March 


1987 compiler version. 
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The reader is cautioned that computer programs developed 
in this research may not have been exercised for all cases 
of interest. While every effort has been made, within the 
time available, to ensure that the programs are free of 
computational and logic errors, they cannot be considera. 
validated. Any sj ojonlsieysheil icles) (oye these programs without 
additional verification is at the risk of the user: 

Many terms used in this thesis are registered trademarks 
of commercial products. Rather than attempting to cite each 
individual  CeetiEatemee of a trademark, all registered 
trademarks appearing in this thesis are listed below the 
firm holding the trademark: 

INMOS limited, Bristol, Unitcecuringeun 

Transpueer 
OCCAM 

IMS T414 
LMS Esc 


Transputer Development System (TDS) 
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A. BACKGROUND 

The driving force behind modern weapon Systems is the 
processor. At present, most weapon systems utilize a tradi- 
Mmaonmale single-CPuUr serlal™ computer “for their processing. 
Unfortunately, such a computer can process a certain amount 
of data ina fixed amount of time and no more. A modern 
radar or sonar can overwhelm the system processor with a 
MmEDoGmOoG “sayedataw If only part of that data is processed, 
the rest is lost forever. A new, more powerful computer is 
required to handle more data. This thesis is part of an 
effort to improve the processing power of weapon systems. 

Multiple computers working in parallel offer significant 
increases in processing power in away that provides flex- 
ibility and expandability. If one needs more processing 
power, one adds more computers. 

The biggest problem in parallel multi-computer networks 
is interprocessor communication. This communication is us- 
ually handled in one of two ways. The traditional method has 
been to connect multiple computers together by way of a 
shared bus. The computers communicate by leaving messages 
in a single shared memory. The shared memory and shared bus 
create bottlenecks, however. Memory also creates a bottle- 


neck, since bus bandwidth is higher than memory bandwidth. 


The number of computers that can be attached is limivtveadmay 
the bus bandwidth and/or memory bandwidth. A second approach 
is to have computers communicate with each other by passing 
messages along direct links. In such systems, each computer 
has Sirs sown, Meme, - 

This thesis concentrates on the latter method? Sige: 
is attractive for uSe in a weapon system for reasons of 
flexibility, growth potential, fault tolerance, lower we@gee® 
better response time, and higher system availability. 

Microprocessors would seem well suited for a parallel 
multi-computer network in a weapon system as they are inex- 
pensive, small, lightweight, and increasingly powerful, 
Until now, however, microprocessors were designed to operate 
as stand alone computers. Parallelism, with its requirement 
elie eonnuneencten between computers, was a difficult pro- 
blem. Since they were not originally intended for this role, 
their architectures were not suited for it. If they had any 
provisions for communications at all, they were added on as 
an afterthought. 

A microprocessor family called the Transputer, designed 
from the ground up for parallelism, is now available from 
INMOS corporation. It features four full duplex serialeaws 
munication links and a language also specifically desmaiea 
for parallel processing. The Transputer is ideal as a build- 


ing block for a parallel system of micropwé@ecessors? 
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Sees hyeehewevalUation Of any Computer system, be it 
single or multi-processor, depends upon the problem. This 
thesis evaluates a network of Transputers working on a pro- 
blem that can be broken up into independent work packets. 
Pact pde«EeGnral nompanamceera mhequired to process part of 
the problem. Packets may be combined into a bundle to make 
communications between nodes more efficient. The amount of 
SompucLacion per packet may be very small or very large and 
may vary depending on the particular packet. 

A critical part of any parallel multi-computer network 
feeeneswork "astribution algorithm. Such an algorithm known 
as the Workfarm is simple and very effective for many pro- 
blems including Ray Tracing and Mandelbrot Set and is used 
Poeenis thesis [MaSh&s7]. 

With four links on each Transputer, many physical beso. 
logies for 4 network are possible. To make communication and 
PemcLiguracion software less complex, a very regular and 
symmetric topology is often desirable. There are many such 
topologies available, from a simple linear array or binary 
tree to more exotic designs like hypercube or hypernet 
[HwGh87]. 

The time it takes for a network of Transputer to com- 
plete a problem using a workfarm is primarily dependent on 
the following factors: the number and speed of the Trans- 


puters, the number of computations per packet, the number of 


packets per bundle, and the total number of packets ine 
problem. 

This thesis will examine how these factors are interre- 
lated. The two primary questions addressed are, given a 
workfarm, 1) how many Transputers will be required to solve 
a problem given a time limit, and 2) how fast can a4 proeweg 
be solved for a given number of Transputers? 

The resulting predictions will be compared ~to @eeua 
results on two specific problems: Mandelbrot Set [Po86] and 
Coordinate Transformation [Ri87]. These problems can be 
characterized by their divisibility into work packets which 
can be processed in any order and by their massive computa- 


tional requirements. 


Bs. THESTS MOVER fey 

Chapter II presents a brief look at the Transputer 
system. Chapter III discusses the workfarm, how it is imple-— 
mented on a network of Transputers, and what topology is 
used. Chapter IV presents the results of timing studies 
using the workfarm. Chapter V uses the findings of Chapter 
Iv to predict the performance of the Coordinate Transtonpuas 
tion and Mandelbrot Set problems and compares them with the 
actual results. Chapter VI presents the conclusions and re- 


commendations of the thesis. 
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A. GENERAL 

It is hard to separate the Transputer hardware and Occam 
software. They are so tightly intertwined that is easier to 
treat them as a single entity; the Transputer system. A 
basic understanding of this system is necessary when pro- 
gramming a network of Transputers. Although this learning 
may seem onerous, it may well be the reason Transputers are 
relatively easy to program in parallel compared with other 
microprocessors and languages. The reader is assumed to be 
familiar with the fundamentals of the Transputer architec- 
ture and the Occam programming language. For those un- 
Pimeliae with Transputers, [I1n88] and [PoMa87] are excellent 
starting points. This chapter highlights the knowledge ne- 
cessary for efficient parallel processing and performance 
maximization. 

Despite a network of Transputers's tremendous power, a 
program has to be as efficient as possible to realize the 
full potential. To do so, the programmer must first ensure 
meat Cach individual Transputer is optimized and then that 
the network is. The latter is primarily a matter of work 
Peer lOuULion and communication. The following is a synopsis 
of Transputer performance maximization; a complete reference 


eam be found in [At87]. 


B. SINGLE TRANSPUTER OPTIMIZATION 

It is extremely desirable to keep all the data struc- 
tures and code in the on-chip memory. An on-chip memory 
cycle takes one processor cycle (50 nanoseconds for the 
latest 20MHz Transputers). Although variable, depending on 
the instruction and on the speed of the DRAM chips, external 
memory references usually take five processor cycles 
[In87a]. In other words, one external memory access takes 
five times as long as an on-chip memory access. 

Given a choice between data structures on-chip or pro- 
gram code on-chip but not both, one would choose data struc- 
tures. A Transputer word holds four bytes; one memory access 
of program code returns four single byte instructions. (iia 
combined with the principle of locality, means accessing 
program instructions from external memory is not nearly as 
slow as accessing data structures from external memory. 

Transputer memory is arranged as follows, from the base 
of memory upwards: system space, data space, code space, 
with any unused space on top. Off chip (external) memory is 
not allotted until there is no free on-chip memory remain- 
ing. The Occam compiler and Transputer loader software auto- 
matically places data (data structures, workspaces, jouee 
lower than program code on the Occam map {[At87]; hence, 
program code only resides in on-chip memory if thewaaes 


Space has already been accommodated. 


ties prodranmer = aas control over the order in which 
iebaeseeweeunres  OCcUL in memory Cnhrougn the order in which 
me 7 aremecelared) in the code. Simply put, data structures 
declared last ina local block are placed lower in the me- 
mory map than their predecessors. 

PaEeseweaeaeseslenunes, suich aS arrays; should be de- 
clared first in a local block, followed by any variables, so 
that the variables are allotted memory lower in the map, 
ahead of the large arrays. Otherwise, frequently used vari- 
ables may be allotted off chip memory. To ensure large data 
structures do not monopolize on-chip memory, they can be 
declared in a global process. In keeping with proper pro- 
Pranmang practices, 1.e€., declaring data structures locally, 
the global data structures can be artificially declared 
locally using abbreviations with no performance cost. 

Concurrent process workspaces also reside in the data 
portion of the memory map. The programmer controls which 
processes lie lower in the memory map by the order in which 
mmaey are declared. Like data structures, those processes 
declared last are allotted workspaces lower in the memory 
map. 

To finetune the Transputer further, awareness of where 
variables lie relative to the workspace pointer is useful. 
Only a single byte instruction is required to manipulate the 
first 16 locations above the workspace pointer because the 


HOur bit relative address fits in the lower half of the 


Single byte instruction. This optimization is technique 


useful in a local block with more than 16 variables. 


C. MULTIPLE TRANSPUIER OPT taro 

A key to maximizing the performance of a network of 
Transputers is decoupling communication from calculauiage 
This is accomplished by running communication and calcula- 
tion processes separately and in parallel. Other processes, 
also running in parallel, act as buffers between the com- 
munication and calculation processes. The processes communi- 
cate among themselves through internal or external channels. 
With such a setup, a Transputer may now communicate and 
calculate simultaneously, with little degradation in Glues 
area. 

Equally important, the communication processes must 
always run at high priority and the calculation processes at 
low priority. Consider a network of Transputers. If each 
Transputer only passes messages when finished its own cal- 
culation, the majority of the Transputers in the me@ewauws 
will constantly lie idle, waiting to receive work or to Sem 
results. The communication must take precedence so that the 
message passing is uninhibited. 

Because the communication and calculation are decoupled 
in each Transputer, the communication does not signifticameme 
affect the ongoing calculation. How much calculation degra= 
dation actually occurs during ongoing communications 


researched in {Ha87]} and [Br88]}. In a nutShell, a sitaeae 


Beene pucermeonnumucatimg on all four links at full capacity, 
women Is) tae worst Case Scenario, can still calculate at 
approximately 75% of its maximum capability [Ha87]. The 
separate DMA engines on the chip make this possible, al- 
though the degradation is caused by internal bus contention 
between the link engines and the central processor. 

The actual link data rates have been investigated pre- 
feoucsly Jn [Vas/] and [Br8s]e >To summarize, one can expect 
rates of approximately 2.3 Mbytes/second through T800 links 
during bidirectional communication and 1.7 Mbytes/second 
Peeing WNnidireclrional communication. The T414 has rates of 
1.5 Mbytes/second during bidirectional communication and 
0.76 Mbytes/second aiune ini unidirectional communication 
[Br88]. These rates assume no external memory usage. The 
T800 communicates significantly faster than the T414 because 
of a handshaking improvement in link communication [In87b]. 

A third area in which a programmer can significantly 
affect performance is the length of the communication mes- 
sage. The overhead to send a single integer over a link is 
the same as that to send an array of 100 integers, for ex- 
ample. Obviously it is better to keep messages as long as 
possible and cut down on the overhead. 

As the message length grows, however, the probability 
Meese ases Chat its data Structure will reside in off chip 
memory. That of course means significant performance degra- 


dation. 


The trick is to keep the message as long aS possibile 
without going into off chip memory. The programmer Canine 
gure out the happy medium by keeping track of how much data 
structure space he is using and making sure it is less than 
the on-chip memory size. If it is more, the programmer has 
to shorten his message arrays until all the data structures 


Fit Inmgen—-Chip meme icy 


D. MISCELLANEOUS TIPS 
As pointed out in [Br88], the timer should not be util- 
ized in the B004 T414 Transputer. It doeS not Tretuene 
accurate measure of time because the T1414 1S executTiiG@iiaes 
cesses that the user is not aware of and these hidden pro- 
cesses figure into any timing measurements. The timer should 
be utilized on a remote Transputer and the time returned 
over a link ren eaaspillay. 
Occam is very strongly typed. In a network of Transpu- 
ters, where message passing is paramount, this comes heavily 
into play. The type of data at one end of a channel must be 
the same type that comes out the other end or the program 
deadlocks. This error iS particularly insidious bee@ame 
there is always a lot of communication in a network and 
there are no messages to tell why the program has stopped. 
For this reason, it is imperative to begin testing ona 
B0O04, followed by testing on a single remote Transputer, 


before testing a program on a network of Transputers. 


Ie. 


Protocols are used to identify what is coming across a 
channel; however, only data of that protocol will be able to 
use the channel. A more flexible approach is use of the CHAN 
OF ANY declaration which allows the programmer to send any- 
maang down a Channel as long as the same thing comes out the 
@rner end. 

(iotieeeerOcolsw@rren the same flexibility, They are 
expensive in terms of overhead, however. A cheaper but 
trickier method is to declare channels using CHAN OF ANY and 
sending arrays through these channels with a single byte or 
integer tag at the head. The tag allows the receiver to know 
Maat type sor data £Lollows in the array. This manual method 
requires retyping, a practice requiring much care from the 
programmer, and should be well tested on a single remote 
Transputer before it is used in a network. 

All Transputer links may run at the standard speed of 
10 Mbits/second; however, most members of the Transputer 
family are now capable of running their links at 20 Mbits/- 
second. The BO04, BOO2, and BOO1 are the only commonly used 
boards restricted to the standard link speed. Fortunately, 
the 20 Mbits/second capable boards allow their Link Os to be 
See at 10 Mbits/second while the rest of the links run at 20 
Mbits/second. Normally a network of Transputers is run at 20 
Mbits/second for fastest communications except where an 
input is accepted from a B004 host computer or BO01/2 RS232 


interface board. It is easy to tell if the DIP switches have 


thal 


been improperly set; an error message will flash when the 


extracted code will not load through the mismatched link. 


E. DEBUGGING 

The Transputer's internal architecture time slices par- 
allel processes to simulate parallel processing. This Gatems 
the programmer because he can test his program on one Trans- 
puter; then, with minimal change, run his program on a net- 
work of Transputers as intended. 

Based on programming experiences, it is recommended that 
a program intended for a network of Transputers be developed 
in three steps. First, the calculation process is tested 
sequentially on the BOO04 T7414 Transputer. The BOO4's host 
computer allows the programmer to easily display any vari- 
ables in the executing program. 

The next step is to run the same program on a remote 
Transputer connected to the BOO4. Any bugs with the external 
links or channel protocols will reveal themselves in this 
step. Because a program may work on one Transputer using 
internal channels but not on multiple Transputers using 
external channels, this step is important. During execution, 
variables may be passed over external links back to the BO04 
fOr display. 

Finally, the program is run on a network of Transputers: 
There may still be bugs but at least the programmer knows 
that a large portion of his code is good, especially the 


channel protocols. The first two testing systems remain and 


TZ 


PMiemeregqranmmea scam Leuce chem to Check out any questions he 
may have. 

ic oGcOouetuus Eicdeeimecd “Network of Transputers a pro- 
gram either works or it deadlocks. When a program deadlocks, 
all the programmer sees is a blinking cursor. There is very 
miccle the programmer can do to determine what went wrong, 
where the problem occurred, or to obtain a state trace. 
Poaditaonally, because the™Transputers run in parallel, it is 
aext co impossible to display variables to a monitor during 


execution. 


Its 
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A. GENERAL 

A problem has to be divisible for use in a paraliee 
system. A parallel system of Transputers is SO powertUeaa. 
the problem should also be computation intensive. This 
sis deals with problems that are both divisible and computa- 
tion intensive. Problems that are not divisible (assuming a 


Single input data stream) are not germane to this thesis. 


Be SePPELINE 

The pipeline is a well known work distribution algage 
ithm. It is ideal for problems that divide into taskKeuiaws 
can be assigned to separate processors. With the Occam pro- 
gramming Language and the Transputer links, it iS Gaayaee 
configure a network of Transputers into a pipeline, as de- 
monstrated in Figures! 3.1 and i362. 

Unfortunately, pipelines are only as fast as their slow- 
est process. In a pipeline, the Transputers execute the 
following cycle: 

WHILE TRUE 

SEQ 
communicate 
synchronize 


calculate 
Si ica Ome. 
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BaAGure 73. 1 


Transputer Pipeline 





[4]CHAN OF ANY chan: 


PAR 
inigheielic, Vial enanm0)) 
aoc | ate el@lsligh || sla 
Sr@erwr  cman| 1}, chanfl2)) 
EeOcime cian 2), Ghan| 3 |} 
Siiapue (Chan[e)], out) 
BuUGure 3.2 


ieaamopn eer. Woe line, Code 
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To maximize performance, the calculation time of the proceece 
ses must be equal. This limits the problem range: 

A special case is when each processor in a pipeline can 
do the same task. Heat transfer along a wire is such a pro- 
blem, although in this case the pipeline is bidireéctronaas 
The calculation time for each processor is equal and so the 
network can achieve peak efficiency. The configuration for 
this special pipeline, as demonstrated in Figures 3.3 and 


3.4, is even simpler than that of the standard pipeline. 


C. WORKFARM 

The workfarm is a work distribution algorithm in wie 
each processor does the same task on part of a problem in- 
stead of each processor doing a separate task as in the 
Standard pipeline. It is highly effective on problems that 
can be divided up into independent work packets, where each 
packet consists of parameters necessary to calculate a part 
of the problem. Independent means that no packet is depen- 
dent on any others. If one packet of the total were lost, 
only a small piece of the problem would be missing. The 
amount of work required per packet may vary. 

The workfarm has two distinct parts: the Controllereawd 
the Farm. The controller combines packets into request bun- 
dles and passes the request bundles to the farm, ensuring 
that there are never more bundles in the farm than the farm 


can handle. The controller also receives the result bungee 
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PigGure. 3.3 


Bidirectional Pipeline 


fnum.nodes+1]CHAN OF ANY right, left: 
SEQ 
PeourmeEignie|O}], Lert( oj) 
PAR 1 = 0 FOR num.nodes 
Meodemmargnt{i}, right{i+1], 
ettpeend |, leftli}) 
output (right([num.nodes], left[num.nodes] ) 





Figure 3.4 


Bidirectional Pipeline Code 


ey 


The packets are grouped into bundles to optimize the length 
of the message arrays as discussed earlier. 

The farm consists of multiple nodes arranged in some 
topology. This thesis uses a linear array of nodes for 
reasons explained later. A workfarm of this type is pictured 
ip Ee Ge Sore 

When a node receives a request bundle and is not "Sues 
it accepts the new one for processing. However, if the node 
is busy when the request bundle arrives, it passes the bun- 
dle to the next node (further away from the controller). 
Result bundles, arriving from the opposite direction, are 
Simply passed along to the next node until they reach the 
controller. When a node finishes processing a bundle jueae 
sends its result bundle towards the controller. 

In the workfarm, each node has the same code, although 
Sometimes it may be desirable to make minor modifications to 
the end node code. 

The configuration code for a workfarm is listed Gaia 
gure 3.6. Two arrays of channels are declared, one to carry 
the request bundles out to the farm and the other to return 
the result bundles back to the controller. Expanding the 
farm of Transputers iS aS easy as changing the ConsStame 
'num.Transputers'. Note that the controller and node proces- 
ses each have been "placed" on a separate Transputer, in 


this case a T414 controller and T800 nodes. 
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results [0] 
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Linear Array Workfarm 





Poon lecmapucers |\CHANSOR ANY requests, results: 
PLACED PAR 
PROGHoSOR IT 4 100 
controller (requests[0], results[0]) 
Pil ecubPePARsi)=- 70 FOR Numeabransputers 
PR@GHOSOR, £S 1 
node (requests[i], requests[i+l], 
results[i+l], results[i]) 





ld One g = Sara s 


Linear Array Workfarm Code 
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The controller consists of four processes runninGaaae 
parallel as shown in Figures 3.7 and 3.8. Requests and re- 
sults are external channels passed in by the global con- 
figuration process; all other channels are internal to the 
controller and must be declared. 

The generator initiates the entire workfarm process by 
creating the request bundles and passing them to the work 
router. If necessary, the generator may receive inputs from 
outside sources through external channels to build the pack- 
ets and bundles. The generator signals the handler just 
before it begins to pass out bundles and is in turn sig- 
nalled by the handler when the handler has received the last 
result bundle. 

The work router and result router are essentially buffer 
processes. Together, they also perform a vital valve func- 
tion to make sure the farm never exceeds its capacity for 
request bundles. The work router knows how many nodes are in 
the farm. Since each node has a buffer enabling it to hold 
two bundles at a time, the work router knows the farm can 
hold twice as many request bundles as the number of nodes. 
It waS necessary to reduce by one the farm bundle capacity 
known to the work router to avoid overloading the farm. 

When the work router passes a bundle to the farm, it 
increments a counter. When the results router receives a 
bundle, it signals the work router through the trigger Gham 


nel. When the work router is signaled, it decrements the 
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to.router requests [0] 









generator 


to.handler 


from.handler 
triager 





router 





from.router results [0] _ 











results 


IgileWbe(s 35 7 


The Controller 


ero ,Or ANY to.router, from.router: 

CHAN OF ANY to.handler, from.handler: 

GHAN OF ANY trigger: 

PAR 
@enerator (to.router, to.handler, from.handler) 
Pe@wreOUrere POerOULCL, eriggen, requests) 
results.router (from.router, trigger, results) 
handler (from.router, to.handler, from.handler) 


Erou re seis 


Controller Code 


21 


counter. The work router will only accept bundles from the 
generator while the counter does not exceed the farm bundle 
capacity. The code to implement this valve function is shown 
in FP Vqurpess. 7. 

When the handler receives the last of the result bundles 
it signals the generator that the problem has been complet- 


ed. 





--- Work.router | 
VAL farm.capacity IS (num. Transputers = 
BOOL bundle.done: 

ED Gouna. 








SEQ 
COUNT == 0 
WHEE Nee 
PR AiG 
trigger ? bundle.done 
COUMe on Commie a wl 
(count <= £arm.capacity) & CO.nOUGer 7 yeaacme 
SEQ 
requests ! bundle 
COUNT += 7coune = I 


--- Results.router 
VAL bundle.done IS TRUE: 
WHILE TRUE 

SEQ 


results ? bundle 


PAR 
trigger ! bundle.done 
to.handler ! bundle 





ldaleptiigis 3. 3 


Valve Function Code 


A single node on the farm consists of five processeara 
work router, a result router, a work buffer, a result buf- 


fer, and a calculation process. A single node iS Shown 


Fags 


Pia ero cicmi tie NOcIce that the calculation process 
is given low priority while the four communication processes 


run at high priority, as explained in Chapter II. 


CHAN OF ANY Pomel memo eca LCULaAcTON : 
CHAN OF ANY from.calculation, from.buffer: 
CHAN OF BOOL signal: 
PRI PAR 
PAR 
work.router (requests.in, requests.out, 
LOVoOUmEer,signal) 
VOm@@oulEreas EOnouLter, to,.calculation) 
Besar bere trOm.calculation, from.buffer) 
PeeehouUEcm( TeOmeburrer, results.in, results.out) 
Colemeaclon mace, carculac ion, £rom.calculation ) 








Ergure 3.10 


Single Node Code 


The work router accepts bundles from the requests-in 
channel. If the work buffer is full, the work router relays 
the bundle to the next node in line by way of the requests- 
out channel. If the work buffer is empty, the bundle is sent 
there and the work buffer full flag is set. 

The work and result buffers are present to decouple the 
communication in the work and result router processes from 
mie Calculation in the calculation process, as explained in 
Chapter II. 

The work buffer holds a single bundle at a time. When 
the calculation process accepts a bundle from the work buf- 
fer, the work buffer signals the work router that the buffer 
is empty. 
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The calculation process is where all work occurs. There, 
upon arrival, a bundle is separated into packets. The pack- 
ets are processed sequentially and the results grouped into 
a result bundle. After the result bundle has been passed to 
mieeresule butter, the calculation process is ready to ac- 
Sepe another bundle. Since the calculation process is se- 
quential, it could be coded in a programming language other 
than OCCAM such as Ada, Pascal, or C and inserted into the 
OCCAM workfarm harness. 

The results buffer is a pure buffer that relays a result 
bundle to the result handler and waits for another result 
bundle to arrive. 

The results router receives result bundles either from 
the result buffer or the results-in external channel. In 
either case, the bundle is discharged along the results-out 
channel. An arriving bundle from the buffer is given prior- 
ity over the external channel so that the calculation pro- 
cess will be free for more work as soon as possible. 

There has to be some device to allow for matters of 
initialization and reporting in the farm. The method used 
in this thesis was to place a tag at the head of every bun- 
dle array. Depending on the tag, a bundle could be of the 
BOrlOWiIng types: setup, data, or report. Generally, upon 
arrival of a bundle, the tag is examined, and action is 


taken on the bundle accordingly. 
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De LOPeCEeey 

A workfarm is not limited to the linear array, vreccsman. 
an attractive option. Because of communication path lenges 
trees would seem better suited for a workfarm with large 
numbers of Transputers. Consider linear array and Dilmeam 
tree workfarms, each with 100 farm nodes, for examples 
Assuming the problem was computation intensive enough soe 
that all 100 nodes would be utilized, a bundle would have to 
travel through 99 nodes to get to the 100th node in the 
linear array. In the binary tree, however, a bundle would 
have to travel through at most six nodes to reach any of the 
100 nodes. A trinary tree would lower the communication 
overhead further. Given a workfarm of the same number of 
Transputers working the same problem, trees are more effi- 
cient than linear array in terms of utilizing each Transpu- 
Cer. 

Trees were not used in the research for a number of 
reasons, however. First of all, there were not enough Trans- 
puters available to conduct research and achieve significant 
results. A binary tree workfarm was implemented but the 
performance gains were so miniscule that the author bewies 
that significant gains would not be realized until large 
numbers of Transputers, perhaps 50 or more, were used. The 
modification to the linear array workfarm algorithm tomeeee 
ieve binary tree topology was slight and only occurred in 


the work and result routing processes of the node process. 
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The actual code is listed in Appendix A. Secondly, the BOO3 
boards, with four Transputers each, are not well suited to 
implementing trees since half of the 16 links are hardwired. 

A large number of processors must be arranged in a regu- 
iaceeashienm Gr the Configuration code and communication 
algorithms become too complex. Commonly used regular struc- 
Mivg@eceace atbrays (1D, 2D, 3D, ...), trees (dDinary, trinary, 
etc.), and hypercubes. More complex but still regular struc- 
tures have been proposed such as hypernet [HwGh87]. Such 
structures appear promising for a networks of Large numbers 


ee Transputers. 
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IV. WORKFARM PERFORMANCE 


A. GENERAL 

The performance of a workfarm is dependent on the 
following factors: the number and speed of the Transputers, 
the number of packets in the problem, the number of computa- 
tions per packet, the packet size, and the number of packets 
per bundle. To determine the relationships between these 
factors and ultimately the number of Transputers required to 
complete a problem in a certain amount of time, research was 


conducted on a standard workfarm using a variable problem. 


B. ENVIRONMENT 

The research in this chapter was conducted on a workfarm 
with a physical configuration as pictured in Figure 471g 
number of Transputers utilized in the farm portion was var- 
ied from one to eight. The speed of an individual Transpu- 
ter depended on its respective type. The controller process 
was placed ona remote T414-15 Transputer. The T414 on the 
BO04 board was used only for the compiling, extraction, and 
loading of code to the workfarm network. The farm nodes were 
placed om 13800 Transputer.. 

A BOO2 board with its T414-15 and RS232 interface was 
utilized so that results could be displayed to a VT220 ter- 


minal. The BO0O7 board was included in the network for any 
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future graphic work. It acted solely as a relay and dicuiee 
figure in any of the testing. The entire workfarm network 
ran at a common link speed of 20 Mbits/second except for the 


BOO4 and BOO2 boards. 


C. TESTING APPARATUS 

To change problems on a workfarm requires modilica wae 
of the calculation process of the farm nodes, the generator 
process, the handler process, and the message arrays that 
pass throughout the farm. To test the workfarm, it was ne- 
cessary to be able to vary the problem size quickly ana 
easily. The problem consisted of a variable number of 32a 
floating point (REAL32) multiplications and the consSu@ams 
communication overhead of the workfarm algorithm. When the 
farm nodes were initialized, each farm node was passed an 
integer representing a certain number of calculations tpen 
packet. 

The generator process produced a certain number of pack- 
ets which were grouped into bundles and distributed to the 
farm, as shown in Figure 4.2. When a calculation process in 
the farm received a bundle, it simulated separating the 
packets from the bundle and did "calculation.per.packets" 
REAL32 multiplications. The calculation process simudaues 
assembling a result bundle and was then ready for the next 
bundle. This testing calculation process iS Shown in Fi@ume 


oe 
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tag := data 
SEQ i = 0 for total.bundles 
SEQ 
SEQ j = 1 FOR packets.per.bundle 
[bundle FROM (j TIMES 4) FOR 4} 
iatummy array BPROMSO FOR 4 ] 
requests ! bundle 


Figure 4.2 


Generator Process Code 


SEQ i = 1 FOR packets.per.bundle 
SEO 
[dummy.array FROM 0 FOR 4] := 
([bundle.in FROM (i TIMES 4) FOR 4] 
S2oeyma © HOR Calcs:per.packet 


x a5 XXX 
SEQ j = 0 FOR 4 
bundle.out [j] := dummy (BYTE) 


Bagure 4.3 


Calculation Process Code 


Sal 


A timer was started preceding the initialization phase. 
When the handler process received the last result bundle the 
problem was completed and the timer was stopped. The time 
required to complete the entire problem was simply the stop 
time minus the start time. 

During the report phase, the handler received paekeme 
from the farm nodes that contained the number of bundles 
that each node processed and the node identification. The 
handler passed this information and the problem completion 
time to the vVT220 terminal for display. 

The bundle used for communication in the workfarm con- 
Sisted of an array of bytes. The first four bytes were re- 
typed to represent the integer tag. The next eight bytes 
were retyped into a setup array of two integers tomas 
initialization values. When the bundles were used in their 
normal role of carrying packets, they were essentially 
"dummy" arrays, as the only meaningful information in each 


bundle was the tag. Figure 4.4 lists the bundle declaration. 


VAL packets.per.bundle IS 100007 coral punciiess 
VAL bundle. 


size IS packets.per.bundle + 1: 
VAL packet.size.int Slee 
VAL bundle.size.byte TS bundle.size.int TIiMEswae 
VAL packet.size.byte IS packet.size.inte TIMMS 


| [bundle.size]BYTE bundle: 
| INT tag RETYPES [bundle FROM 0 FOR 4]: 
[JINT setup.array RETYPES [bundle FROM 4 FOR@ea® 


| 
i 


Figure 4.4 


Bundle Declaration 


Si 


Mie packet Size used for the research was one integer. 
This represented a unit packet with four byte-size para- 
meters. Most problems will have packets consisting of some 
Nbeiple Of four bytes; for “example, the Mandelbrot Set 
problem has 12 byte request packets and 32 byte result pack- 


ets. 


De RESULTS AND CONCLUSIONS 

Figure 4.5 illustrates some of the relationships that 
Seist im an eight Transputer workfarm among the many vari- 
ables that affect performance. The vertical axis shows the 
time to calculate 10,000 packets. The horizontal axis indi- 
cates the number of REAL32 multiplications each packet 
represents. If calculations.per.packet is 100, the total 
werex<load is 1,000,000 REAL32 multiplications. 

Fach line of the graph represents a different bundle 
Size, in terms of packets per bundle. For example, a bundle 
made up of 16 packets is 17 integers long (68 bytes): 16 one 
integer packets and a one integer tag. 

Fach line begins flat and at some point begins increas- 
ing linearly. The flat line indicates two things. While the 
imme iS flat, not all eight Transputers are being utilized. 
For small bundles of one packet per bundle, the farm is 
Mm@eerutilized until the workload reaches 120 calculations 
per packet. As the bundle size increases, more Transputers 


are utilized sooner. For example, at 16 packets per bundle, 
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200 





Bom oOucsari ioe ruby Utilized at approximately 30 calcula- 
tions per packet. 

The flat line also means that the workfarm is completing 
an increasing number of calculations with no corresponding 
increase in time. This is explained by observing how many 
Meanseuters were Wrilized at each calculation point. As the 
workload increased, the workfarm utilized more of its avail- 
able processors. Of course, there came a time when all 
available Transputers, in this case eight, were in use. From 
then on, the time required increased proportionally to the 
increase in the workload. 

Once all the farm Transputers were in use, and the work- 
load continued increasing, the time increased at a constant 
mace. This rate was equal for all bundle sizes. 

The question of time to complete a problem can best be 


represented by the following equation: 
time := (calcs.per.packet * total.packets) + comm.ovhd 


With a small number of calculations per packets, the com- 
femrcation overhead is a significant factor in the equation. 
As the workload increases, the communication overhead de- 
Meeases 1n significance. 

The time required to complete the problem decreased as 
the bundle size grew. At greater than 16 packets per bundle, 
however, the time reductions became negligible. This was not 


SumEprising, considering that with one packet per bundle, 
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half of the bundle was used for communication overhead (one 
integer for the tag, one integer for the packet) eee. 
packets per bundle, however, 1/17 of the bundle was over- 
head, or 5.9 percent. Between one packet per bundle and 16 
packets per bundle there was a large difference in) theme me 
head percentage; the reductions in time was correspondingly 
great. The overhead percentage difference between 16 and 32 
packets per bundle was small, however, and the reduct ima 
time required was also correspondingly small. 

In each farm node process, one bundle is declared in 
each of the four communication processes and two bundles are 
declared (bundle.in and bundle.out) in the calculation pro- 
cess. Sixteen packets per bundle (17 integers) is a good 
bundle size since the bundle is large enough to yield near 
optimal performance yet small enough to require little on- 
chip memory. Bundle sizes greater than 667 bytes will leave 
no room on the T800 on-chip memory for any other data struc- 
tures because six bundle arrays will require 4002 bytes of 
memory which is close to the T800 on-chip memory capacity of 
2096. 

With 100 calculations per packet and 10,000 packets, the 
total workload is 1,000,000 REAL32 multiplications Siew 
communication overhead. Dividing the workload by the total 
time to complete the workload yields "workfarm mega-floating 
point operations per second" (WF-MFLOPS). Plotting WF-MFLOPS 


versus number of Transputers yields the graph in Figure 4.6. 
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Figure 4.6 


Linear Performance 


The resulting nearly straight line indicates that the work- 
farm is achieving near linear performance; that is to Say, 
WE-MPLOPS is directly proportional to the number of Trans- 
puters in the farm. 

It should be realized that at some number of Transpu- 
memo, che line in Figure 4.6 will turn sharply horizontal. 
Where on the graph this occurs depends on the workload and 
number of Transputers. It represents the point at which the 


controller cannot provide request bundles fast enough to 


oa 


keep all the farm nodes busy. Adding more Transputers to the 
farm beyond this point is wasteful, since no further in- 
crease in work capacity can ever occur. 

It would be helpful to be able to predict either how 
many Transputers in a farm are needed to solve a problem in 
a certain amount of time or how long a problem wilitjeae 
given a certain number of farm Transputers. To do so, one 
has to realize that a workfarm can be in one of two limiting 
conditions, depending on the workload and number of Transpu- 
ters. The first case is when the workfarm is "calculagaeg 
limited"; that is, the ultimate performance is limitecmie, 
the workload, not by the controller request bundle genera- 
tion rate. The second case is when the workfarm is "com- 
munication limited"; with small workloads, the farm nodes 
are able to complete request bundles faster than the con- 
troller can supply them. 

The workfarm performance can be characterized by a set 
of equations using the notation in Table 4.1. Figure waa 
provided as a reference. 


The time to solve a problem on a workfarm (T) is obviously 
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The total number of bundles (N) in a problem is a known 
factor that can be used for predictions. The rate that bun- 
dles can be accepted and processed by a single Transputer 
node without degradation (B) can also be determined before- 
hand by measuring the maximum number of bundles that a sin- 
gle farm Transputer can process in a certain time period. 

Consider the calculation limited workfarm, whieh 
characterized by a large workload. There are two predictions 
that can be made. First of all, if there is a certain numeee 
of Transputers available, one can determine how long it will 
take to complete the problem. Secondly, if one has a time 


limit in which to complete the problem, one can determine 
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how many Transputers will be required, assuming the problem 
does not become controller limited. 

Research showed that when the farm was calculation limi- 
ted, each farm node processed approximately the same number 
of bundles, with a slight increase in processed bundles from 
mesa LO last node. This corresponds to the decreasing de- 
gradation in the farm as the nodes further from the con- 
troller have to route less and less bundles. If the number 
fee iransputers (mn) is known, and relatively few Transputers 
are used, the number of bundles processed by each farm node 
E=eapproximacrely N + nm Or N;. Since a; iS approximately the 
same for each node (discounting degradation) and the farm is 
calculation intensive, the overall time required by one node 
femeonpobece N-) bundles can be assumed to be approximately 
the same as the overall time taken to complete the entire 
problem. This simple method to predict the overall time of a 
computation limited workfarm was effective to within five 


percent of the actual results. The resulting equation is: 


To complete the problem in a given amount of time, one 
can calculate approximately how many tTransputers will be 


required by merely switching T and n in the above equation: 
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The communication limited case is characterized by a low 
workload with the farm nodes accepting request bundles at a 
greater rate than in the calculation limited case. Obvious- 
ly, the more request bundles accepted, the more result bun- 
dles generated; and in general, a greater percentage of 
processor time is spent passing bundles. The lower the work- 
load, the sooner the farm will be able to exceed the con- 
troller's ability to supply bundles. If the first feveaaw 
nodes can match the rate at which the request bundles come 
from the controller, then nodes further down the line in the 
farm simply will not receive any work. Thus, the controller 
generation rate becomes the limiting factor as r decreases 
from m. This generation rate includes the overhead caused by 
the controller having to accept incoming result bundles ta 
number of nodes in the farm that do useful work is dependent 
on the capacity of the controller to supply request bunds 

It would be useful to know at what point a partieuiiag 
problem becomes communication limited; that is, for any 
problem, what is the maximum number of nodes in the farm 
that will do useful work. The workload and total number of 
bundles are known from the problem description. A rough 
estimation may be reached by assuming that at equilibrium, 
where some number of farm nodes are able to match the con- 
troller's request bundle generation rate (r), each working 
node is processing approximately the same number of bundles. 


Dividing r by the single node maximum calculation rate (B) 
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yields the approximate number of nodes that will do useful 
on Pormatinacewvyeeet 1S difficult to ascertain without 
Pe bementation and testing. An upper bound may be obtained 
PyeslbStielteing |i fOr xr. By testing on a workfarm with a 
eeeole farm node, as shown in Figure 4.8, m can be deter- 
mined. The single workfarm node merely accepts request bun- 
dles and returns to the controller a simulated result bundle 
TeEnour doing» any Calculations. Thus an upper bound for the 


controller capacity is easily measured. 
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Figure 4.9 shows the actual controller request oun eae 
generation rate (r) versus increasing workload i065 ama 
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The initial rate versus a workload of zero is a high 
8000 bundles per second. This zero workload rate, converted 
to 550 Kbytes per second, approacnes the theoretical maxim 
unidirectional link rate of a T414, 750 Kbytes per Seeamam 


and is in fact m. The zero workload rate will never matcen 
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the theoretical maximum rate because of the controller over- 
head. When a load is put on the farm, the actual rate drops 
off sharply to approximately 5700 bundles per second and 
then decreases gently at a nearly constant rate of approxi- 
mately 100 rate units per workload unit. This rate holds 
until the farm begins to become calculation limited and the 
rate drops sharply again. 

Since r is relatively constant during the communication 
Mmieeed™ oOrLion Of the graph, it can be used in the follow- 
ieerule of thumb equation for determining the maximum num- 
ber of useful farm Transputers in a communication Limited 


workfarm: 


Transputer limit = 


Wh 


Again, m can be substituted for r in this equation to deter- 
mine an upper bound on the number of Transputers a par- 
ticular workload will utilize. Determining r accurately 
without testing the actual problem on a farm of multiple 
Hransputers was a difficult problem. The only success in 
obtaining an approximate value of r was to test the actual 
problem on a farm of at least four Transputers. The result- 
ing r value could then be used to project a Transputer limit 
for problems with larger workloads. The validity of this 


Ma@egecc ion is a function of the accuracy of r and B. 
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Any time the number of Transputers inthe farm is less 
than r + B, those Transputers are going to be fully Gee 


ized, each processing approximately N +n bundles. 


46 


V. PREDICTIONS 


A. GENERAL 

The research conducted in Chapter IV can be used to 
predict the performance of the workfarm in some cases. Two 
different problems are used in this chapter as examples: 
coordinate transformation and Mandelbrot set drawing. Actual 
performance results from these two problems were compared 


perch szhe predictions. 


B. COORDINATE TRANSFORMATION PROBLEM 

This coordinate transformation problem originated from 
research for a autonomous walking machine [Ri87]. The fol- 
lowing is a brief description of the problem and its im- 
plementation on a workfarm. For in-depth coverage of the 
problem itself, the reader is referred to [Ri87]}. 

An autonomous land vehicle, known as the Adaptive Sus- 
pension Vehicle, possesses an optical radar scanner which 
the vehicle uses to "see" the forward terrain. The scanner 
returns range measurements for each elevation and azimuth 
mieremnent in its Scan. A single scan consists of 128 azimuth 
increments for each of 128 elevation increments, a total of 
16,384 iterations. The azimuth, elevation, and range are 
combined with six other inputs from the vehicle's inertial 
Navigation system to develop a cartesian coordinate position 


of the particular scanned point. The elevation of every 
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point in the scan is kept in a terrain matrix data struc= 
GuEe. 

For the workfarm implementation, each packet represented 
one scan iteration and consisted of three bytes representing 
the scanner azimuth angle, the scanner elevation angle, and 
the resulting range. There were 16384 packets in one scan. 

The vehicle's inertial navigation system (INS) supplied 
the vehicle attitude and position information, “ier agGeeem 
of six inputs. The attitude information consisted Gow. 
vehicle's azimuth angle, pitch angle, and roll angle. The 
positional information consisted of the vehicle's x, Y}ilame 
z transformation (distance) from the INS reference point. 
Because these six inputs remained constant for the Grigamee 
Scan, they could be passed to the farm and processed as much 
as possible during the initialization phase of the workfarm. 

The radar scanner was easily implemented on a separate 
"Scanner" process. Byte range values for a flat, zero eleva- 
tion scan were calculated ‘and then sequentially passed to 
the controller process through a channel. Although not im- 
plemented, the rate of outgoing range values could be easily 
controlled through use of the Transpurer eerie. 

Each of the 16384 packets in a single scan were proces- 
sed in the following way. Each byte in the packet was first 
hashed into a useful 32-bit floating point real number 
(REAL32). For example, the scanner azimuth range was -40 to 


+40 degrees. Since a byte can only represent the numbers 0 
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Boe oseectne scanner azimuth byte had to be converted into 
the correct data upon arrival in a farm node. Each bundle 
contained a byte tag and 128 packets. 

For each packet the three new parameters were combined 
with the six parameters received during the initialization 
Baasce Using a Denavit-Hartenberg (D-H) transformation to 
mela the x, yy, and z coordinates of the particular spot 
Beimag scanned {Ri87]. These three coordinates were themsel- 
ves converted into bytes, loaded into the result bundle, and 
eventually passed back to the controller. 

The problem was implemented on a standard workfarm as 
described in Chapters III and IV; that is, the controller 
process was placed on a T414-15 and eight farm nodes were 
placed on eight T800s. The scanner process was placed on a 
separate T414-15 and shared a single Link with the con- 
troller process. The calculation process of a farm node is 
fsted in Appendix C. 

Testing on a single T800 Transputer yielded an ap- 
peeximate calculation rate (B) of 43.63 bundles per second. 
Total bundles for one scan (N) was 128. 

Eiewmactual “time” f£or the workfarm to process a single 
sean (T) was 0.427 seconds, including loading the resultant 
Meese altitude ™ values “into a terrain map matrix. The farm 
was controller limited, with 6.3 of the 8 Transputers being 


tied lized. 
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The actual controller request bundle flow rate (r) was 
300 bundles per second. If this value of r could Havewbeed 
estimated correctly beforehand, the predicted Transputer 
limit for this problem would have been r divided by B Ona 
Transputers. This is very close to the 6.3 Transputersmeaas 


were actually utilized. 


CG. SMANDEEBROT se. 

The drawing of the Mandelbrot Set is a problem Chae 
demonstrates the best qualities of the workfarm. It is a 
problem that is extremely computation intensive and where 
the amount of work each packet will entail is unknown. 

For an in depth description of the Mandelbrot Set pro- 
blem, the reader is directed to [P0086]; this chapter mainly 
discusses implementation and performance. 

Essentially, the Mandelbrot set is generated by iterating 
a simple function on the points of the complex plane. The 
points that produce a cycle (the same value over and over 
again) fall in the set, whereas the points thatdavoea 
(give ever-growing values) lie outside it. When plotted on 
a computer screen in many colors (different collorsi@iem 
different rates of divergence), the points outside the set 
can produce pictures of great beauty a rogar 

The problem is divided into independent work packets; 
each packet containing an integer tag and two other integers 
that represent a coordinate position on the complex plames 
As stated before, the packet workload is variable; there is 
no way of knowing beforehand how much calculation each pack- 


et will require. Each packet really represents 16 coor- 


dinates because the farm node uses the single coordinate in 
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BHewepacdet as the starting point for 15 more consecutive 
horizontal coordinates. Each iteration can entail from one 
EO eSomeleeps SOk approximately ten arithmetic operations 
each. 

Pie eoORa iE aese particular problem is that tne handler 
iene controller has to pass result bundles to the graphics 
routine on the BOO7 graphics board where the results are 
G@rawn. The controller flow rate (r) is lowered because of 
this overhead. 

The implementation of the Mandelbrot set onto the work- 
farm is somewhat different than in previous implementations. 
There is no benefit in bundling together packets to improve 
communication efficiency because of the variable packet 
workload. A request packet is already fairly large, 12 
bytes, and aresult packet is even larger, 32 bytes. Both 
request and result packets represent 16 coordinate points 
which can represent a massive amount of computation. 

The generator and handler processes of the controller 
and the calculation process of the farm node are listed in 
Appendix D. 

The problem was implemented on the same workfarm con- 
figuration as the coordinate transformation problem with the 
exception of the scanner. 

The coordinate matrix was 512 by 512; thus there were 
16384 packets (N), each representing 16 horizontal coor- 


dinates as stated before. 
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When each coordinate only required one loop iteration, a 
solid dark gray screen was drawn in1.5 seconds. The farm 
was controller limited as only 3.2 of the 8 Transputers were 
utilized. When each coordinate required 256 iterations, the 
workfarm required 81.5 seconds to draw a solid black screen. 
The farm, of course, waS Computation limited and ali eum. 
Transputers were utilized with only a slight variation 
the number of packets processed by each. 

Because the packet workload is variable, predictions are 
possible only when each coordinate represents the same a- 
mount of Loop iterations; i.e., the screen is' solid@emag. 
(256 iterations per coordinate) or solid dark graymaewe 
iteration per coordinate). 

Testing on a single T800 Transputer yielded an ap= 
proximate calculation rate (B) of 39 packets per second. 
Using this value in the calculation limited equation 
predict the actual time to draw a solid black screen ona 
farm of eight T800 Transputers yielded 52.5 seconds. As 
noted previously, however, the actual time was 81.5 seconds. 
This discrepancy arose because the controller was limiting 
the farm although the problem was still calculation Limited. 
The controller limited the farm because the controller's 
handler had to send every results packet to the graphics 
Transputer for drawing. This caused the controller to wait 


because the graphics process would not accept another packet 
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[ie cicm previous sone had been completed (drawn). This 
Voreinig translated into Significant overhead. 

The actual problem was again tested, this time without 
the controller handler having to pass results to the gra- 
phics Transputer (no picture was drawn); the controller 
mendler merely accepted results from the farm. The actual 
mime, in this Case was approximately 53 seconds, very close 
eeetne Original prediction. 

Clearly, the equations do not work if the controller has 
memag work On the results after reception. In this case, for 
Picemeeiuations to remain applicable, the controller handler 
needs to relay work to a buffer Transputer between the con- 
eroller and the graphics Transputer (for example). Or, if 
much work of a different type is needed, the results could 


be passed to another, separate farm. 


So 


VI. CONCLUSIONS AND RECOMMENDATIONS 


A. CONCLUSIONS 

This thesis deals primarily with the work disStrig@ean 
algorithm known as workfarm. Although limited to problems 
divisible into independent work packets, it is a simple yet 
extremely effective way of processing in parallel and achie- 
ving near linear performance speedups. Processing rapid 
Streams of data from a weapon system sensor would seem to 
fall into the workfarm category of problems. The coordi. 
transformation example demonstrates that the workfarm is 


well suited for a radar problem. 


B. RECOMMENDATIONS 

Although the equations for workfarm performance have 
been developed in this thesis, research on how to accurately 
estimate the controller request bundle flow rate for a farm 
of a given number of Transputers and given workload needs to 
be done to predict the limit at which a farm becomes con- 
troller limited. For the same reason, research is needed to 
accurately estimate the degradation factor for each node in 
ene? fai. 

Both the pipeline and workfarm are good for specific 
types of problems; however, much work remains in developing 


and evaluating new work distribution algorithmS TIfo0r oGm@ea 
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Punaseor Sereblems. Specitically, [{HwCn87] appears promising 
Sisesa SOUrce for new work distribution algorithms. 

Much work remains, too, in development and evaluation of 
new network physical topologies. For example, in a workfarm, 
a simple binary tree farm topology would allow a controller 
MmmcUDplLyY ssene Lapme with a far greater rate of request bun- 
Obes than it could to a linear array farm. Other topologies, 
such as hypernet [HwGh87], would be extremely interesting to 
implement. Currently there are too few Transputers in the 
lab to significantly explore these different topologies; 
perhaps, more topology research can be done when greater 
numbers of Transputers are available. The relative ease to 
configure multi-Transputer networks by virtue of the OCCAM 
programming language makes widespread research in network 
mepology Practical for the first time. 

An ADA compiler will be available for the Transputer 
family soon; since ADA is the Department of Defense standard 
programming language, research on the implementation of Ada 
On Transputers should begin as soon as the compiler arrives. 

iiemeultimace Goal of an ongoing series of Transputer 
Mm@eses at the U.S. Naval Postgraduate School, of which this 
thesis is part, is to develop an alternative computer ar- 
chitecture for the U.S. Navy Aegis Combat System. The exper- 
tise base represented by this series of theses has reached 
the point where the next step should be the simulation of 


the AN/SPY-1A 3D Phased Array Radar Controller, the main 
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component of the Aegis Combat System. It seems likely that 
the workfarm would have some utility in such a Simulation. 
One fact stands out from working with Transputers: the 
Transputer system is revolutionary. Its performance jump 
over anything short of a supercomputer is orders of mag- 
nitude. True parallel processing is implemented easily with 
a high level programming language, employing the best ele- 
ments of software engineering. The Transputer system seems 
especially useful for the military, considering the Transpu- 
ters suitability for embedded control applications and par- 


allel processing networks. 
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APPENDIX A 


BINARY TREE WORKFARM SOURCE CODE 


The only difference between the binary tree and the 
linear array workfarms occurs in the request and result 
routers of the farm node. To keep the program compact, two 
separate versions of the farm node were developed, a fork 
mode and a leaf node. That is all that is listed in this 
appendix. The complete algorithm for a linear array workfarm 


is in Appendix B. 


PROC fork(CHAN OF ANY requests.in, requests.out.left, 
requests.out.right, 
Boos wan aslete,e pesults,in.right, 
results.out, 
VAL INT eOC sa) 

CHAN OF ANY from.result.router, signal: 

maeN OF ANY to.buffer, to.calculation: 

CHAN OF ANY from.buffer, from.calculation: 


PRI PAR 
PAR 
VAL left IS FALSE: 
Veber is TRUE? 
SQumuntecatlon 
calculation 


om 


— cece eee emcee mci cece eee eee ee ee 


--- declarations 

[bundle.size|BYTE Dunder 

INT tag RETYPES [bundle FROM 0 FOR 4]: 
BOOL d.buffer.empty, switch: 

NED NuUM.Lefe, num ewe - 


SEQ 
--- initialization 
d. butter .empeye]: ——0RUe 
ie 
PEeOc. 1d 280 
SEO 
num.left 
HUM sens 
jSpatoler. cle pas 16 
SEQ 
NUM. Lore 
DUM, come 


| || 
O’ 


Ht oll 
we) 


WHILE TRUE 
PRI ALT 
ALT 
Signal ? d.buffer.empty 
SKIP 
from.result. router 2 switen 
IF 
SWieen, = tern 
numebeft S= num ert 
switch = right 
eyelyMomalefeks, 4 — Nelli scslelaneme > Jl 
requests.in ? bundle 
Le 
tag = data 
ees 
d.buffer.empty 
SEO 
d.buffer.empty := FALSE 
€O.0uUtfer !s oundike 


num.left >= numeri 
SEQ 
num.letke == num. [ese 
requests.out.left ! bundle 
else 
SEQ 
num.right 3= MuUmeete ieee 
requests .ouct. right oud 


a8 


tag = Secup 
PAR 
requests.out.left ! bundle 
requests.out.right ! bundle 
mermbutrer ! bundle 


tag = report 
PAR 
requests.out.left ! bundle 
FeEQUESTS.OUt.right ! bundle 
to.buffer ! bundle 


—S => == =p oe oc eee eee om oss cee cee ees ee ee com cme ee coe oo eee cme cme emp cme eee cee cme cme cme cee cop cme cme cme ee = cee we ee cme ee oe cee cm mee ass o> oe ass == «> ae cee ee ses ae aes oom: 


op op ee ee ee eee eee eo es ee es ie oe es es ee ee ess es ee ie es es i eo ee ee eo eee ee eee eee ee oe 


[bundle.size]BYTE bundle: 
WHILE TRUE 
PRI ALT 
BEeOmM.OULEeCr 2? bundle 
results.out ! bundle 
results.in.left ? bundle 
PAR 
meonm. FeSULt.roukcer ! left 
results.out ! bundle 
bes@ices.in.right ? bundle 
PAR 
Peon. beSulLe,.roucer ! right 
results.out ! bundle 
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PROC leaf(CHAN OF ANY requests.in, resulme- cur 

VAL INT Pree. Tay) 

CHAN OF ANY signal: 

CHAN OF ANY to.buffer, to.calculation: 

CHAN OF ANY from.buffer, from.caleumacrverne 

PRI PAR 

PAR 
communication 
Caleulac ron 


--- declarations 

[bundle.size]BYTE bundle: 

INT tag RETYPES [bundle FROM 0 FOR 4]: 
BOOL d.buffer.empty: 


SEQ 
d.buffer.empty := TRUE 
WHILE TRUE 
PRI ALT 
Signal ? d.buffer.empty 
SKIP 
requests.in ? bundle 
IF 


IF 
d.buffer.empty 
SEO 
d.buffer.empty := FALSE 
GO. butter’! ~sundite 
else 
SKIP 


Cag = Setup 
tO. DULLeGr = oundile 


tag = report 
EO, burner bungie 


=—e a= a= === a= === au ame gee am ame a= ame ==> aap am ame a= ==) Gee ==) a= ==> a= aes a= a= amp ame ame cee ae cme ame cme a= ome cme cee cee ame ae cme em am cme am cme eee cee ae ame ame cee acme am am a= == 


[bundle.size|BYTE bundle: 
WHILE: TRUE 
SEO 
from.buffer ? bundle 
results.out ! bundle 
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APPENDIX B 


CHNP RIC VWwORKEARM SOURCE CODE 


--- global variable file 


VAL MAXnumtT8 Is ele 

VAL total.bundles TS B25: 

VAL total.packets iS LOOO OE 

VAL packets.per.bundle IS total.packets/total.bundles: 
wei bundle.size.int Ls packets.per.bundle + 1: 
VAL packet.size.int iS Ale 

VAL base.calc LS 116) 

WAL calc.loops 13 eG 

VAL bundle.size 1S) PUtiGae. Sizeoinu TIMES «4 ; 
VAL packet.size rs packet.size.int TIMES 4: 
VAL farmSIZE ILS MAXnumT8: 

VAL workSIZE iS CearmiokZn LIMES 32) =): 
VAL data is ibe 

VAL setup iS ier 

VAL report Is Sie 

VAL else Is ALISQU/ 2, ¥ 


— =e eee ame cee cee cme om eo em owe eo awe ae ie] see Se ewe Se cee eee Se ae cee cee cme ce cme cme come come cee cme cme cme cme cm cm cm cm cm eee eee oe oe ee ee oe ee eee oe oe ee ee ae =e 


meawe, rOor(CHAN OF ANY to.graph, from.graph, requests, 
results) 
--- internal channels 
CHAN OF ANY to.handler, from.handler, 
POnEOULer « From. COULGCY , 
Eragger: 


PAR 
generator(to.router, to.handler, from.handler) 
work.router(to.router, trigger, requests) 
results.router(from.router, trigger, results) 
Handles fErom.router, to.handler, from.handler, to.graph, 
from.graph) 
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PROC node(CHAN OF ANY requests.in, requests ouc, 
results.in, resuleszeu= 


VAL INT DECC. 1a) 
--- internal channel declarations 
CHAN OF ANY toO.buLrfer, toO.caleulacwonre 
CHAN OF ANY from.calculation, Eromebunae ma 
CHAN OF BOOL sake pgle) 1 
PRI PAR 
PAR 


request router 

request buffer 

result buffer 
sae LOSU Me 2 Olen 
» Calculation 
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— ome ee wes ee ee ws we ce ees es ee es es es wes we wes ee es sw eee ow oe sen eee ees es eee eee ese ae ae es es ae 0 


PROC generator(CHAN OF ANY to.router, to.handler, 
from.handler) 


cm ee ee ee we es ee es es es we es es ee es i i i sss aes as ee ee se se SS ss Ss SO Swe Fr Ss 


--- gee laws Louis 
pommcdle [size | evik ojo Ss 
INT ieee) RETYPES [bundle FROM 0 FOR 4]: 
PP iNT setup.array RETYPES [bundle FROM 4 FOR 8]: 
INT any: 
SEQ calcs.per.packet = 1 FOR calc.loops 

SEQ 


--- start clock 
to.handler ! cales.per.packet 


--- initialize nodes 
tag := setup 
toOnmcoucer ! bundle 


--- generate and send packets 
Eada cata 
Setup.array[0] := base.calc TIMES calcs.per.packet 
SEQ j = 0 FOR total.bundles 
to.router ! bundle 


--- wait till all results have been received and 
graphed 
from.handler ? any 


--- request report 
ieelof BC jalsheroiaie 
GOecouTcer, ! bundle 


— oe ep eee ae cee ee ee ee ee ee es es ee es es es es es es es es es es es ies ee es es es es eee eee 


PROC results.router(CHAN OF ANY from.router, trigger, 


results) 
--- declarations 
[bundle.size]BYTE bundle 
VAL packet.done ILS TRUE 
WHILE TRUE 
aE 
results ? bundle 
PAR 


trigger ! packet.done 
Eeomeroucer ! bundle 
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PROC work.router(CHAN OF ANY to.router, trigger, requests) 
--- declarations 
[bundle.size]BYTE bundle: 
INT tag RETYPES [bundle FROMeG Fore 
BOOL packet.done, repeuramc: 
INT wOrkCOUNT: 


SEQ 
--- initialization 
WOrkKCOUNT := 0 
reporting := FALSE 
WHILE TRUE 
PR Ae 
trigger ? packet.done 
Le 
NOT reporting 
wOrkCOUNT := workCOUNT - 1 
else 
SKIP 
(wOrkCOUNT <= workSIZE) & to.router ? bundle 
IF 
tag = data 
SHES, 
requests ! bundle 
wOrkCOUNT := workCOUNT + 1 
EEG = sec up 
SEO 
reporting := FALSE 


requests ! bundle 


tag = report 
SEQ 
report 77 Noe 
requests ! bundle 
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ErR@ceaanaler(GHAn OF ANY “from.routcer, to.handler, 


from.handler, screen, keyboard) 


cece ce cee ce ce cece cc ce ce ee ec ee ce ce ce ce ce cc cc ce ee eee oe 


--- declarations 
[bundle.size]BYTE bundle: 
[JINT report.array RETYPES [bundle FROM 0 FOR 8]: 
INT node.id Is report.array[0]: 
INT num.node.bundles IS BepOre.cistoy |) | > 
VAL go Is ils 
Merk Clock: 
ENT Start.time, stop.time: 
INT Galese per. packet: 
SEQ 

WHILE TRUE 

SEQ 


--- start clock on controls command 
to.handler ? calcs.per.packet 
clock ? start.time 


--- receive data packets 
SEQ i= 0 FOR total.bundles 
EEOMnbOuUce: = bundle 


--- stop the timer 
Glock 7? Stop:.time 


--- let controller know all done graphing 
from.handler ! go 


--~ make terminal report 
write.int(screen,(stop.time-start.time) TIMES 64,9) 
write.int(screen,(calcs.per.packet TIMES 
base.calc), 4) 
SEQ 1 = 0 FOR farmSIZE 
SEO 

EeoOmeroucer 2 bundle 

write.int(screen, num.node.bundles, 4) 

newline(screen) 
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--- declarations 

[bundle.size]BYTE bundle: 

INT tag RETYPES {bundle FROM 0 FOR 4}: 
BOOL d.buffer.empty 


SEQ 
GQ. bULEGCrE empty. ==. Ur 
WHILE TRUE 
PRI ALT 
signal ? d.buffer.empty 
SyGi NS 
requests.in ? bundle 
Ee 
tag = data 
1S) 
d.buffer.empty 
SEQ 


d.buffer.empty := FALSE 
to. bULLer !pumaike 
else 
requests.out ! bundle 


tag = setup 
cr 
proc.id < (MAXnumT8-1) 
PAR 
requests.out ! bundle 
to.buffer ! bundle 


else ---— last node 
COvouULrer |! = bunele 


tag = report 
bg 
proc.id < (MAXnumT8-1) 
PAR 
requests.out ! bundle 
to.buffter |! bumaiie 
else 
cto. butter ! buncie 
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mmm me cme eee es es es es ee es es es ee ee ee ee es es ee ess ss sees aes a ae = Ss SS .- V0 5CQjr _aYre aS = 


--- declarations 

fbundle.size]BYTE bundle: 

INT tag RETYPES [bundle FROM O FOR 4]: 
VAL buffer.empty IS TRUE 


WHILE TRUE 
SEQ 
to.buffer ? bundle 
ile 
tag = data 
SEQ 

temec Lculacion ! bundle 
Signal ! buffer.empty 

tag = setup 


tomealculatkion !* bundle 
tag = report 
to.calculation ! bundle 


(bundle.size]BYTE bundle: 
WHEEL, TRUE 
SEQ 
from.calculation ? bundle 
from.buffer ! bundle 


pounale.size|BYTE bundle: 
WHILE TRUE 
PRI ALT 
from. buffer ? bundle 
results.out ! bundle 
results.in ? bundle 
mesllts.Oouc ! bundle 
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—_—om a oe ae se ee ee ee ee eee ee eee eee ee ee ee eee ee ee eee eee eee ee ee ee ee ee eee eee ee ee eee 


—> = Ge ee oom oem cee ome) cee oeee ceee cee) cee) GE Ge ce cee qe ceme come coup comb comb comb cee) comb cum) coup oem oem) cee cee cow GD Ge a= a= a= a= cee a= ae ame cow com cee oem ae eee ie ae a= = 


--- declarations 

Peitligve ILS ts sbee pe soils bundle.in, bundle eum 

[JINT setup.array RETYPES [bundlé.in FROM™ FoR eae 
[ ]JINT reper. array RETYPES [bundle.out FROM 0 FOR 6]= 
[packet.size]BYTE WOrK array: 

INT tag RETYPES [bundle.in FROM 0 FOR 47: 


INT calcs.per.packet IS setup: array 
INT num.node.bundles: 


REA) 2 eke 


SEO 
WHILE TRUE 
SEQ 
to.calculatten ? bumdvesin 
Tess 


tag = data 
SEO 
SEQ i= 1 FOR packets.per.bundle 
SEQ 
fwork.array FROM O FOR 4] := [bundle.in FROM 
(i TIMES 4) FOR 4] 
SEQ j = 0 FOR calcs.per.packet 
X 3:= X*xX 
SEQ j = 0 FOR packet.size 
bundle .out[}] := 2(8yYTR, 


from.calculation !' bund lerour 
num.node.bundles := num.node.bundles + 1 


wags= "Serums 
SEQ 
num.node.bundles := 0 
X += 90.9999 (REAL 32) 


cag = srepore 
SEQ 


report.array (0) "= pucemre 
report.array[1] := num.node.bundles 
from.calculation ! bundlexeu. 
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APPENDIX C 


COORDINATE TRANSFORMATION PROBLEM 


Only those 


SOURCEMGODE 


Dorcions Cmecodeméiiterent from that of the 


generic workfarm are included in this Appendix. 


--- global variable file 
VAL total.bundles 
WAL totalspackets 


VAL packets.per.bundle 1S 


VAL bundle.size 


Is IPAS) © 

tS 16384: 
total.packets/total.bundles: 

1S SiS: 


— oe ame cee ame cee cme a=ee cee ame a cee ame a= eee ame cme ame cme com ame ae ae ame cum sum cum se ae ammo su cum jew occu cow cm oem cum em oqo cm cee oc om me cm eee 


PROC generator(CHAN OF ANY to.router, to.handler, 
from.handler, from.scanner ) 
--- declarations 
[bundle.size]BYTE 
[ JREAL32 setup.array RETYPES [bundle FROM 4 FOR 24]: 
I 


bundle: 


BYTE tag S bundle[0]: 

Raabsa2 ASY.pitchn IS setup. array|0]: 

REAL32 ASV.az 1S setupearray({1]: 

BHEALSZ ASV.-roll IS setup.array/[2]: 

REAL32 ASV.X Poms eceUp. cdarayi| oi: 

REAL32 ASV.y ES SCrup.dameray (4). 

BHALS2Z ASV.2 PemSeLuparmiay | Sit: 

INT pm elol—> @emmr-behi ar 

SEQ 
--- start clock 
eo. nandler © 1 
--- initialize nodes 
tag := setup 
move OLtCh = 0.0(REAL32) ---asv.pitch 
ASV .az = 0.Q0(REAL32) ---aSv.az 
BSV.roOll] = 0.0(REAL32) ---asv.roll 
ASV.X = 0O.0(REAL32) ---asv.x 
ASV.Y = 64 0 (REA 2 6 ———asy.y 
ASV .2 := -8.0(REAL32) ---aSv.zZ 
mO,router ! bundle 
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--- generate and send packets 
tag := data 
SEO (i —— CO SHOR erze 


SHG 
SEQ j = O FOR 128 
SEO 
index := j TIMES 3 
bundle[index+1] := BYTE i 
bundle[index+2] := BYTE j 


from.scanner ? bundlej index.) 
to.router ! bundle 


--- request report 
from.handler ? any 
Lag := Lepore 

to.router ! bundle 


PROC handler(CHAN OF ANY f£rom.router, to.handler, 
from.handler, €o0.gqrapm, from quapa 


oe ce es eo ee es eo es es ee ee es es ee ee es es ee ee ee ee ee ee es es es ee es es es es es es es es es es es ees ee ee 


--- declarations 

(i268 (1-2 eyes Seasylial plsyer 

[bundle.size]BYTE bundle: 

[ J INT TEPOLt. array RETYPES [bundle FROM 0 FOR Weg: 


INT node, id iL report .array[0]): 
INT node.bundles aS TEPOrL. dohayie i: 
VAL Bbeaqgy iS ie 
INT start.time, stop.time 
seat total.bundles: 
TIMER CLOG: 
INT Xn, Un eee eee 
INT any) Incex. 
SEQ 
--- init 


SHO 1 = 70S rOR ws 25 
SEQ j= sO Onwe Zo 
Cerrain Map l) i =a0 


WHILE TRUE 
DEO 
-~-- start clock on controls command 
to.handler ? any 
clock ? Stast. time 
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--- receive data packets 
SEQ i = 0 FOR 128 
SEQ 
from.router ? bundle 
SEOuge = ORPOR 128 


SEQ 
index := j TIMES 3 
x.int := INT bundle[ index] 
y.int := INT bundle[index+1] 
eine 2= (ENT bumerle| index+2]) - 128 
eee Naot = oe X- Lhe) t= 22int 


--- stop the timer 
clock ? stop.time 


~~- let controller know all done graphing 
from.handler ! ready 


~-~- make terminal report 

newline(to.graph) 

newline(to.graph) 

write.int(to.grapnh, (stop.time-start.time) *64,10) 

newline(to.graph) 

total.bundles := 0 

Soe te =O FOR EarmolZe 

SEQ 

BEeloroulcehw:, "bundle 
Vee MeveGworapnl, node.id, 3) 
write.int(to.graph, node.bundles, 10) 
total.bundles := total.bundles + node.bundles 
newline(to.graph) 

newline(to.graph) 

Velecetnc(be,G@rapn, total.bundles, 13) 


~-- display wire terrain graph on SONY 
Prize Pit ealtt tude. array: 


VAL XMID rs Use’ 
VAL YBASE ES ‘4 S09) 2 
VAL scalefactor Ls 200000/256: 


DNL rep yew neriz, vert, x.old, y.old, x.new, y.new: 
Pies noble Verel: 


Gil 


SEQ 


--- initialization 
Sposa = Oy IRON Il Ys 
altitude.array[i] := 512 
vert := 0 
to.graph ! c.select bq jcoloume. 
EFOM-gGrapn yy sepa, 
to.graph ! ¢.select. to. colomee 
£rom. Graph, ereeuy 
to.graph "! Serna eae 
from.graph ? reply 
to.graph ! c.select.screen;0 
LLOM.¢rapiiees feo 
to.graph ! ¢.cléar.screenmg 
from.graph ? reply 
to.graph ! c.display .screen a 
From.draphes srepry 
to.graph ! c.move; 2567450 
from.graph ? reply 
SEQ 1° =50 BOR SIZe 
SEQ 
Hi@ia Zee eae 
horizl := (horiz * scaletaeuen, tae 
vertl := (vert * scaletacrer), leer 
x.Old := XMID - vert 
y.Old := YBASE - (vertl + terrain.map[i]J{0]} 
--- process row 
SEO 4) = 1 FORM 27 
SEO 
X.new := (XMID + hoOriz) =avere 
y.new := YBASE - (horizl + (vertl + 
terrain. mMap{ il} (ad 
--- drawline 
ei 
y.new < altivude.array imam 
SEO 
to.graph ! cC.draw. line x oree 
y.old; x.new; y.new 
from.graph ? reply 
altitude.array[j] := y.new 
else 
Shane 
x.Old := x.new 
y.old := y.new 
NOLriZ 23=" NO mia, ee 
horizl := (horiz * scalefactor 7 =. 
vert := vert + 2 
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cee eee ee ee es ee es es es es ce eee ee ee ee es es ee ee ee eee ee eee eee 


--- bundle declarations 


[bundle.size]BYTE bundle.in: 

fbundle.size)]BYTE bundle.out: 

[]REAL32 setup.array RETYPES [bundle.in FROM 4 FOR 24]: 
[ JINT report.array RETYPES [bundle.out FROM 0 FOR 8]: 
BYTE tag is bUmemie. mi 0 |: 

INT node.bundles: 


--- computation variables 

BYTE x. DYte, ¥.bYCe, ge byte: 

INT SOC aye Cp ete 

REAL32 x.real, y.real, z.real: 

mrsbs2 a,b,c,e,ixeg,i, jak: 

RBAL32 c4c5, s4c5, d9s8, four, eight, twelve: 

hens? C4,C5,66,¢67,68,54,85,86,s7,88,a7,Qa9: 

meaAGbs2 ScCanner.az, Scanner.pitch, target.range: 

merns2 ASV.X;)ASV.Y, ASVez, ASV.az, ASV.pitch, ASV.roll: 
REAL32 templ, temp2, temp3, temp4: 

BYTE raw.scanner.pitch, raw.scanner.az, raw.target.range: 


--- constants 

VAL Pi ewe oo (REAL 3 2)": 
VAL PiBy2 IS Pi/2.0(REAL32): 
wei tactor IS Pi/180.0(REAL32): 


iNr index: 
SEO 
Witt LB PRUE 
SEQ 
bencalculacion 2? bundle.in 
ind 
tag = data 
SEQ 
SEO 2 =—erreR 128 
SEQ 
index := z TIMES 3 


===" Process raw data packet: 

--- pitch into degrees from -15 to -75 

--- convert az into degrees from -40 to +40 
coi ome ange into feet from 0 to 32 


templ := REAL32 ROUND 
(INT bundle.in[index+1]) 
GemeZz: —615.0( REALS 2) 
temp3 := 1.47244(REAL32) * templ 
scanner.pitch := templ - (temp2 + temp3) 


AS 


temp4 := REAL32 ROUND 
(INT bundle.in[index+2] ) 
scanner.az := (temp4 * 0.6299(REAL32)) - 
40.0(REAL32) 


target.range := (REAL32 ROUND 
(INT bundle.in[index+3]))/8.0(REAL32) 


--- do sines and cosines 


scanner.pitch := (factor * scanner.piGemeee 
PiBy2 
scanner.az := (factor * scanner.az) — Pibye 


COSP(¢7, Scanner. piren) 
COSP(c&S?7 scanneneaa 
SINP(s7, scanner.pitcn) 
SEINP(S8;, "seannercace 


--- assign variables 


qd9s8 = s8*target.range 

BOUGE = (a9s8 * c7) + (ayy ae 

eight = (d9s8 * S7) + (a7 * S7) 

twelve = c8*target.range 

--- calculate 3 points of the 4x4 matrix 

x.real := ASV.x + ((a * four) + ((b * Gi@nm 
+ (c * twelve))) 

y.real := ASV.y + ((e * four) + ((if * ei 
+ (g * twelve) )) 

z.real := ASV.z + ((i * four) + ((j * eig@mae 
+ (k * twelve))) 

--- convert results into bytes 

bundle.out[index] := BYTE (INT ROUND x.real) 

bundle.out[index+1] := BYTE (INT ROUND 

y.real) 


bundle.out[{index+2] 


BYTE ((INT ROUND 
z.real) + 128) 


from.calculation ! bundle out 
node.bundles := node.bundles + 1 


tag = setup 


SEO 


node.bundles 


ASV 
ASV 
ASV 


ASV. 
ASV. 
ASV. 


DLE = setup.array([0] 
az = setup.array[1l] 
fey Ie = setup.array/[2Z)] 
x = setup.array[ 3] 
y = setup.array([4] 
Z = setup.arrey ia 
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ASV. 
ASV. 
ASV. 


convert degrees into radians 


PIeCtuesmnactem ~ ASV. pitch 
az ="taeror ~ ASV.az 
BOWL = factor * ASV.roll 


do sines and cosines 


SINP(s4, (ASV.az + Pi)) 


SINP(s5, ( 
SINP(S6, ( 
COSP(c4, (ASV.az + Pi)) 
COSP(c5, ( 
COSP(cé6, ( 


els 


Q Fh o 
uou tl 


WU. pb. 
tow ut 


ary 


eagea= r 
SEQ 


report.array([0]j 
Bep@meeamiay || 1 | 


ASV.pitch - PiBy2) ) 
ASV.roll + Pi)) 


ASV.pitch - PiBy2)) 
Roy oils Pde \) 


assign variables 
CAaE5 
32> Co 


(c4c5*c6)+(S4*S6) 
c4*s5 
(c4c5*s6)-(S4*C6) 


(S¢E5~C6)=(C4*S6) 
S4*s5 
(S4c5*S6 )+(C4*C6) 


S5*c6 
-c5 
S5*S6 


:= -0.5(REAL32) 


eport 


epcrerenysl(s 
node.bundles 


Peemlecaleulation © bundle.out 


igs 


— oe eee eee cee ce oc wwe ee i es ses eee eee esc eee ee eee ee ee eee eee ee eee ee 


PROC scanner (CHAN OF ANY from.scanner) 


— oe eee ee ee eee ee ee es es es ee ee es es es cc ei i 


--- declarations 
[IZolPb2ss2 ts ranger 
VAL Pi IS 3.1416 (REAR 
VAL faceor IS Pi/180.0(REALZZe 
REAL32 angle.deg, angle.rad, cosangle, deg-ince: 
BYTE range. byte: 
SEO 
--- initialize array of range values 
angle.deg := 75-0(REARaZ 
angle.rad := factor * angle.deg 
COSP(cosangle, angle.rad) 
range.byte := BYTE (INT ROUND (64.0(REAL32)/cosangiey® 
SEO Ll = 0 FORZIZs 
SEQ 
SEO j = 0 FOR 128 
range[il[j] := range.byte 
angle.deg := angle.deg - 0.46875(REAL32) 
angle.rad := factor * angle.deg 
COSP(cosangle, angle crad) 
range.byte := BYTE (INT ROUND 


(64.0(REAL32)/cosangle) ) 


=—-= PUNDNCURs Gan cemo alts 
SEQ i = O FOR 128 
SHO f) =—0>) HOR ME 
from.scanner ! range[i]}[j] 


76 


APPENDIX D 
MANDELBROT SET PROBLEM SOURCE CODE 


--- Mandelbrot global constants 


VAL MAXnumtT8 i Srroe 

VAL packetSIZE TS aor 

VAL else 1S) AUR UNS 3 

VAL rSiZzeE LS Sl2: 

ob ISLZE S522. 

VAL rSTEPS IS rSIZE/packetSIZE: 
VAL packetCOUNT feeealnro  < LoLae 
wal, COUNTLIMIT iS Zoe 

VAL farmSIZE ZS MAXnumTs: 

VAL workSIZE Psat Labmol Zs * +2) — 1 
VAL data ESme: 

VAL setup 1S ee 

va «report ES 35 


PROC root(CHAN OF ANY to.graph, from.graph, 
requests, results) 


ee cee ee ee ee es ee ee ce es es ee es es es ee es es es es es es es ee ee eo es ee ee ee ee ee ee es es es es es es es es ee ee es ee ee ee eee eo es 


-- internal channels 


CHAN OF ANY tO. LOULeCRewEE rE OM. LOULEL : 
CHAN OF ANY to.handler, from.handler: 
CHAN OF ANY trigger: 

wa, Zoom.in Ser On 


Pak ZOOom.out TS 2 
VAL rSIZE.real IS (REAL64 ROUND rSIZE): 
VAL iSIZE.real IS (REAL64 ROUND iSIZE): 


PAR 
generator(to.router, to.handier, from.handler) 
work.router(to.router, trigger, requests) 
results.router(from.router, trigger, results) 
handler(from.router, to.handler, from.handler, to.graph, 
from.graph) 


qh 


i i i ee 


PROC generator(CHAN OF ANY to.router, to.handler, 
from.handler ) 
-- data variables 
[22 | BYTH eager men 
INT tag RETYPES {data.array FROM O FOR 4]: 
[]INT START RETYPES [data.array FROM 4 FOR 8]: 
INE ESTART [S STARE on 
INT 1START 1S STAG wee 
REAL64 setup.value RETYPES [data.array FROM 4 FOR 8]: 


REAL64 rMIN, rMAX, iMIN, iMAX 
REAL64 rMIN.temp, rMAX.temp, iMIN.temp, iMAX.temp 
REAL64 ul.x.real,ul.y.real Wr-.x real) tr specie 
REAL64 EMID, wie. 
INT mode 
INT any, ul.x, wily ile eee 
SEQ 

rMIN := -2.0(REAL64) 

rMAX := 0.5(REAL64) 

iMIN := 1.25(REAL64) 

iMAX := -1.25(REAL64) 

WHILE TRUE 

SEQ 


~~=- initialize nodes 
tag := setup 


setup.value := rMIN 
to. router I= Cava array 
setup.value := rMAX 
tO. FHOviees ! data.array 
setup.value := iMIN 
CO, noOurer i Jdatda.ammay 
setup.value := iMAX 
to.router 1 Pdavavarsay 


to. handler ws! 


~-- send packets 


Gad 6: =necea 
SEQ 1 =J0@POR 1S L2e 
SEQ 
ISTART := i 
SEO 7 = 0 FORM oi ee 
SEQ 
rSTART := j * packetSIZE 


tO.router 4 daca jaugeay 


--- report 
from.handler ? any 

tag := report 

CO. GOULer Gata scamimcy, 


We 


--- get new plot coordinates 
PEonmmemcdler -amede- Ul. x- UL.y; 
INT32TOREAL64(ul.x.real,ul.x) 
INTeZLORBALGA( ul sy.real, ulm y ) 
INT sZ2TORMambed(lr.x.real,ir.x) 
INES2R@REALG4A (lima, .real ir. y ) 


ul.x.real REAL64 ROUND ul.x 


i he or 


ery 


(( (CMAX-rMIN) *ul.x. real) 


/cSIZE.real)+rMIN 


GCCEMaAxSoMilg nox. real) 


/cSIZE.real)+rMIN 


(( (LMIN-iMAX) *lr.y.real) 


/iSIZE.real)+iMAX 


(((iMIN-iMAX) *ul.y. real) 


/iSIZE.real)+iMAX 


ul.y.real := REAL64 ROUND ul.y 
lr.x.real := RHALG4 ROUND lr.x 
ie. y ced lee= REALG4 ROUND Lr.y 
IF 
mode = zoom.in 
SEQ 
rMIN.temp := 
rMAX.temp := 
iMIN.temp := 
iMAX.temp := 
rMIN := rMIN.temp 
rMAX := rMAX.temp 
iMIN := iMIN.temp 
iMAX := iMAX.temp 
mode = zoom.out 
REAL64 scale.factor: 
SEQ 


scale.factor := 


rSIZE.real/ 


(lr.x.real-ul.x.real) 


rMID := (rMAX + rMIN)/(2.0(REAL64) ) 
iMID := (iMAX + iMIN)/(2.0(REAL64) ) 
rMIN := rMID - (scale. factor*(rMID-rMIN 
rMAX := rMID + (Scale. factor* ( 

iMIN := iMID - (scale. factor*(iMID-iMIN 
iMAX := iMID + (Scale. factor* ( 
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rMAX-rMID 


iMAX-iMID 


PROC handler(CHAN OF ANY from.router, to.handler, 
from.handler, to.gqrapmy faeries 
--- declarations 
[16 + PpacketSiZe | BYTE Graph aria, ae 
INT v.cC.man RETYPES [Graph careay PROM seer sar 
INT v.pSIZE RETYPES [ograpn. array FROM 4 CRs 
[]BYTE result.array I5 [Graph lareay ERG eee 
(8+packetSIZE) ]: 
INT node.packetsS RETYPES [result .array FROMS0s2Cr eae 
iG wode. Caps RETYPES [result.array FROM 4 FOR wai 
BYTE node.id IS Eesulrewares ne 
IN® “cange, eeply. 
INT n.xX, NV, Moke Me ye x le eee ice 
INT delta .x7 delray: 
BOOL m.1l, M.m, Mer, seleecu. ok seene.scelcerm. 
INT start.time, Sstop.cime, recizer 
INT. tocal.| cops -eGeat ceacwcrs. 
TIMER 2e@er 


iNT any. 
SEQ 

cpp et Le gbsbre abs likyAche aliojel 

V.C.Meanm s=9C .MancdeL one 

V.DSIZE := packetsIZE 

WHILE TRUE 

SEQ 

=== INiLt ‘Graphics display, 
IReln cflesleyal VG. 11de. Curso 
from.gqraph ? reply 
EGwGia pl wc. Iie Cre 
frvom.guapa 2 repli 
tO,aaaon ' c.select.screen ; 0 
LrOmaGgraph =wrepiliy 
1@(@) 10 /)gs) Ol 0 IWe2Glear. SCue> a ae 
from.graph ? reply 
to.graph '-e€. display .screcn mae 
Freon. OGrapa™ 2 rcwuy, 
ele) A [ope hela !' c.select.colour.table; 1 
fromn.Gmaph, 7  cepily 
to.gpapn 1 ¢c.set.colour; CcounthiMi nt sty ease 
from.ggapa? reply 


--- start clock 
to.handler ? any 
clock ? start.time 
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--- receive packets 
SOmw= 0 HOR packeeCOUNT 
SEQ 
EVGMMGOU Ce Game Lesule.array 
Lemna tl oaapl.antay 


--- stop clock 
GIGGk > Stop. eime 


--- let controller know all done 
Erom.uanaler ! 1 


--- make terminal report 
newline(to.graph) 
newline(to.graph) 
write.int(to.graph,(stop.time-start.time)*64,10) 
newline(to.graph) 
toed. Loops > = 20 
Poca waenees);— 0 
Se iO POR eieclriticl 2m 
EVE char: 
SHO 
teem souter © resule auray 
Wiehe wit congrapa( DNMenede.id),3) 
write.int(to.graph,node.packets,10) 
Voheee te (eo.qrapa-nmoae. oops ,10) 
total.loops := total.loops + node.loops 
total.packets := total.packets + node.packets 
newline(to.graph) 
newline(to.graph) 
WElvemne( CO.G%apa, cOral.packets, 13) 
Vitro mer eOnGmcmnecOval Woops, 10) 
newline(to.graph) 
WEIlEC Ihe (CO.graph, 

(52s 2/packetSIZE) )-total.packets,13) 
Vorccmiiimiconciaaom, coral. loops/total.packets,10) 
newline(to.graph) 
newline(to.graph) 


-~-- get the new coordinates for calculation 
2 S= RISES Wye Meaone icleyoieclete@ l= 
eO-OGrapn 1 c.cewy.screen; 0 


inom. GicdDlwaeieD Ly 
eonoraute me.selece.f£g.colour; 15 
from.graph ? reply 


--- get the current mouse stats 
Manche .slOw. Cursor 
PooOMm.oLapn ? reply 

done.select := FALSE 

select.ok := FALSE 


oie 


WHILE NOT done.select 
SEQ 
--- wait for any mouse button to be pressed 
tO.graph |! e€.9eugmeuse 
from.graph ? mM.x;Wey em, teers 
WHILE (NOT m.m) AND ((NOT m.1) AND (NOT m.r)) 
SeQ 
to.graph ! c.get.mouse 
from.graph ? N.x7m-y mee mm ci 


LE 
m.m 
BOOL new.select: 
SEO 
m 
Y 


© 
€ 
e 
° 


DHS 
ion oh 
m< KX XK 


x 
x 
y 
ew CU a7 Ree 
~-- process mouse input until middle mouse 
--- is released 
WHILE m.m 
SaIO, 

EO. Graph! 6 -derenetse 

from.graph ? 0. %°-n.y;m-1 mim ee 

re 

((n.x <> 1.x) OR (m.y <> Diy eee 
new.select 


SEQ 
new.select := FALSE 
to.graph ! ¢. Aide veuesun 
£romM. graph; snep ly 
to.graph ! C.copy. seucen as 
from.graph ? reply 
--- set new corner coordinates 
deltavx 22 ae 
deltasy ess Ney soe 
1S 
delta.x > 0 
Ie 
deltany aa 
11S 


delta.x > delta.y 
delta.x := deltam 
TRUE 
delta.y := delta x 
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INIROR Sd: 


tebe 
deitan.x > (-delta.y) 
delta.x := -delta.y 
TRUE 
delta.y := -delta.x 
TRUE 
ae 
delta.y > 0 
ie 
(-delta.x) > delta.y 
delta.x := -delta.y 
‘is, O) ah 
delta.y := -delta.x 
TRUE 
IF 
(-delta.x)>(-delta.y) 
delta.x := delta.y 
TRUE 
delta.y := delta.x 
lowe 1. xX 
li Vee oT) 3 V7. 
wor graphn ! Csdraw.rectangle; 
m.x; m.y; delta.x; delta.y 
Paemeograpi,, reply 


EGCG! we SOW. CUrSOG 
from.graph ? reply 
TRUE 
SKIP 


--- order the screen coordinates for 
--- proper range 


ies] mex + delta.x 
ee) 
lth ae elie X 
SEQ 
nee f= ee 
Mm. xe 1x 
ee Soe ee DS 
TRUE 
SKIP 
iy = m.y + delta.yv 
a iick 
Mey coal ay 
SEQ 
ey as = aly, 
Me Vis olay 
ds =n.y 
TRUE 
SKIP 


a 


select.ok := (delta.x <> 0) AND 
(delta.vy=~ 2. 


-~-- right mouse button hit, do zoom in 
Mee AND Sselecesen 


1a, 0) 
done.select := TRUE 
Qraph.results ! zZoom;in; sine eee = 


1. Xe 


-~-- left mouse button hit, do zoom out 
m.l AND select” ok 


SEQ 
cone -selece <= 71 ke 
graph.results ! Zoom.out; 9m. sn 
lL Xe 
TRUE 
Secale 
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